Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences

نویسندگان

چکیده

Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic machine translation, and text summarization. Previous works addressed the issue with intuition of decomposing sentences linking simple ones, rhetorical-structure-theory (RST)-style discourse split-and-rephrase (SPRP), simplification (TS), sentence decomposition (SSD), etc. However, these are not applicable for parsing abstract meaning representation (AMR) dependency due to misalignments relations unavailabilities preserve original semantics. Following same avoiding deficiencies previous works, we propose a novel framework, hierarchical clause annotation (HCA), capturing clausal structures based on linguistic research hierarchy. With HCA annotated large corpus explore potentialities integrating structural features into sentences. Moreover, decomposed two subtasks, i.e., segmentation provide neural baseline models more-silver annotations. In evaluating proposed our manually dataset, performances resulted in 91.3% F1-scores 88.5% Parseval scores, respectively. Due model architectures employed, differences clause/discourse subtasks was reflected compared corpora, where contained more segment units fewer interrelations than those corpora.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust clause boundary identification for corpus annotation

The paper describes a rule-based system for tagging clause boundaries, implemented for annotating the Estonian Reference Corpus of the University of Tartu, a collection of written texts containing ca 245 million running words and available for querying via Keeleveeb language portal. The system needs information about parts of speech and grammatical categories coded in the word-forms, i.e. it ta...

متن کامل

Building a Parallel Corpus for Monologues with Clause Alignment

Many studies have been reported in the domain of speech-to-speech machine translation systems for travel conversation use. Therefore, a large number of travel domain corpora have become available in recent years. From a wider viewpoint, speech-to-speech systems are required for many purposes other than travel conversation. One of these is monologues (e.g., TV news, lectures, technical presentat...

متن کامل

Interactive semantic analysis of Clause-Level Relationships

Natural Language Processing (NLP) systems usually require large amounts of pre-coded domain knowledge to perform semantic analysis automatically. Until repositories of such background knowledge are widely available, these systems may not scale up to non-trivial applications of NLP. This paper describes the design and implementation of a system that uses surface-syntactic information to interpre...

متن کامل

Parsing with Intraclausal Coordination and Clause Detection

Syntactic analysis, i.e., parsing of text is used during various tasks, e.g., machine translation, question answering, etc. The structure of a sentence is represented with a tree. Parsing long sentences is a difficult task. The motivation was to analyze sub-units of the sentence independently, which could improve the overall parsing accuracy. We developed a new parsing algorithm that includes i...

متن کامل

A corpus study of clause combination

We present a corpus-based investigation of cases of clause combination that can be expressed both through coordination or with subordination. We analyse the data with a two-step computational model which first distinguishes subordination from coordination and then determines the direction for cases of subordination. We find that a wide range of features help with the prediction, notably frequen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13169412